Development of a genre-dependent TTS system with cross-speaker speaking-style transplantation
نویسندگان
چکیده
One of the biggest challenges in speech synthesis is the production of contextually-appropriate naturally sounding synthetic voices. This means that a Text-To-Speech system must be able to analyze a text beyond the sentence limits in order to select, or even modulate, the speaking style according to a broader context. Our current architecture is based on a two-step approach: text genre identification and speaking style synthesis according to the detected discourse genre. For the final implementation, a set of four genres and their corresponding speaking styles were considered: broadcast news, live sport commentaries, interviews and political speeches. In the final TTS evaluation, the four speaking styles were transplanted to the neutral voices of other speakers not included in the training database. When the transplanted styles were compared to the neutral voices, transplantation was significantly preferred and the similarity to the target speaker was as high as 78%.
منابع مشابه
Automatic prosodic modeling for speaker and task adaptation in text-to-speech
One of the most important demands for future TTS systems is their ability to improve naturalness when embedded in a particular task or application that requires a particular speaking style for a particular speaker. In this paper, we present a new prosodic modeling procedure for improving naturalness by adapting a TTS system to a new speaker and a new speaking style. The proposed procedure is an...
متن کاملSDBM-Based Speaker Recognition for Speaking Style Variations
There are many factors corresponding to performance degradation of an actual speaker recognition system. Mismatch in speaking style of a target speaker during training and testing is an important one. When a client enrolls in a system, it is natural for him/her to speak in a spontaneous way. However, it is difficult to maintain the same speaking style throughout test phases. In view of this sit...
متن کاملA Model for Varying Speaking Style in TTS systems
This paper aims to enhance the performance of a TTS system by generating various speaking styles. First we describe three speaking styles (Radio News, Political Address and Conversation) and compare the prosodic features found in these authentic styles with the prosody in “neutral” speech uttered by the eLite TTS system ([1]). Differences concern about 20 prosodic characteristics (F0 span, spee...
متن کاملAdding speaking style to a TTS system
This paper aims to enhance the performance of a TTS system by generating various speaking styles. First we describe three speaking styles (Radio News, Political Address and Conversation) and compare the prosodic features found in these authentic styles with the prosody in “neutral” speech uttered by the eLite TTS system ([1]). Differences concern about 20 prosodic characteristics (F0 span, spee...
متن کاملDesign and evaluation of validity of an electronic alternative and augmentative communication system for Persian-speaking children
Introduction: Due to the high prevalence of communication disorders, augmentative and alternative communication methods are one the options ahead to solve the problems of these people. Since there are no complex tools for Persian-speaking children with communication disorders, we decided to design communication assistant software for these children that produces sound output. Materials and Meth...
متن کامل